Overview

Dataset statistics

Number of variables9
Number of observations409
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory28.9 KiB
Average record size in memory72.3 B

Variable types

Numeric7
Categorical2

Warnings

ChEMBL ID has a high cardinality: 409 distinct values High cardinality
Smiles has a high cardinality: 409 distinct values High cardinality
HBA is highly correlated with HBD and 1 other fieldsHigh correlation
HBD is highly correlated with HBA and 1 other fieldsHigh correlation
PSA is highly correlated with HBA and 1 other fieldsHigh correlation
ChEMBL ID is uniformly distributed Uniform
Smiles is uniformly distributed Uniform
df_index has unique values Unique
ChEMBL ID has unique values Unique
Smiles has unique values Unique
HBD has 54 (13.2%) zeros Zeros

Reproduction

Analysis started2022-01-15 12:13:52.875712
Analysis finished2022-01-15 12:18:00.161833
Duration4 minutes and 7.29 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct409
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean470.0757946
Minimum2
Maximum988
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2022-01-15T17:48:00.380707image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile43.2
Q1195
median442
Q3748
95-th percentile946.2
Maximum988
Range986
Interquartile range (IQR)553

Descriptive statistics

Standard deviation301.3298574
Coefficient of variation (CV)0.6410239814
Kurtosis-1.307572731
Mean470.0757946
Median Absolute Deviation (MAD)278
Skewness0.1500746574
Sum192261
Variance90799.68297
MonotonicityStrictly increasing
2022-01-15T17:48:00.674539image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
21
 
0.2%
6231
 
0.2%
6561
 
0.2%
6491
 
0.2%
6471
 
0.2%
6411
 
0.2%
6371
 
0.2%
6331
 
0.2%
6301
 
0.2%
6271
 
0.2%
Other values (399)399
97.6%
ValueCountFrequency (%)
21
0.2%
41
0.2%
71
0.2%
81
0.2%
91
0.2%
101
0.2%
111
0.2%
121
0.2%
141
0.2%
151
0.2%
ValueCountFrequency (%)
9881
0.2%
9831
0.2%
9801
0.2%
9791
0.2%
9771
0.2%
9751
0.2%
9731
0.2%
9721
0.2%
9711
0.2%
9691
0.2%

ChEMBL ID
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct409
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
CHEMBL67391
 
1
CHEMBL3818875
 
1
CHEMBL1256362
 
1
CHEMBL3753077
 
1
CHEMBL1672002
 
1
Other values (404)
404 

Length

Max length13
Median length12
Mean length12.36919315
Min length9

Characters and Unicode

Total characters5059
Distinct characters16
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique409 ?
Unique (%)100.0%

Sample

1st rowCHEMBL394875
2nd rowCHEMBL200381
3rd rowCHEMBL502351
4th rowCHEMBL492572
5th rowCHEMBL492591

Common Values

ValueCountFrequency (%)
CHEMBL673911
 
0.2%
CHEMBL38188751
 
0.2%
CHEMBL12563621
 
0.2%
CHEMBL37530771
 
0.2%
CHEMBL16720021
 
0.2%
CHEMBL4731591
 
0.2%
CHEMBL32877351
 
0.2%
CHEMBL42247141
 
0.2%
CHEMBL24303591
 
0.2%
CHEMBL3752701
 
0.2%
Other values (399)399
97.6%

Length

2022-01-15T17:48:01.299182image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
chembl5673031
 
0.2%
chembl6007641
 
0.2%
chembl914851
 
0.2%
chembl4703341
 
0.2%
chembl4645521
 
0.2%
chembl21782841
 
0.2%
chembl3040871
 
0.2%
chembl24248121
 
0.2%
chembl20294221
 
0.2%
chembl19172041
 
0.2%
Other values (399)399
97.6%

Most occurring characters

ValueCountFrequency (%)
C409
 
8.1%
H409
 
8.1%
E409
 
8.1%
M409
 
8.1%
B409
 
8.1%
L409
 
8.1%
2350
 
6.9%
3321
 
6.3%
1313
 
6.2%
4265
 
5.2%
Other values (6)1356
26.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2605
51.5%
Uppercase Letter2454
48.5%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2350
13.4%
3321
12.3%
1313
12.0%
4265
10.2%
0258
9.9%
5249
9.6%
9224
8.6%
7224
8.6%
6217
8.3%
8184
7.1%
Uppercase Letter
ValueCountFrequency (%)
C409
16.7%
H409
16.7%
E409
16.7%
M409
16.7%
B409
16.7%
L409
16.7%

Most occurring scripts

ValueCountFrequency (%)
Common2605
51.5%
Latin2454
48.5%

Most frequent character per script

Common
ValueCountFrequency (%)
2350
13.4%
3321
12.3%
1313
12.0%
4265
10.2%
0258
9.9%
5249
9.6%
9224
8.6%
7224
8.6%
6217
8.3%
8184
7.1%
Latin
ValueCountFrequency (%)
C409
16.7%
H409
16.7%
E409
16.7%
M409
16.7%
B409
16.7%
L409
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII5059
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
C409
 
8.1%
H409
 
8.1%
E409
 
8.1%
M409
 
8.1%
B409
 
8.1%
L409
 
8.1%
2350
 
6.9%
3321
 
6.3%
1313
 
6.2%
4265
 
5.2%
Other values (6)1356
26.8%

AlogP
Real number (ℝ)

Distinct310
Distinct (%)75.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.52405868
Minimum-2.32
Maximum9.01
Zeros0
Zeros (%)0.0%
Negative20
Negative (%)4.9%
Memory size3.3 KiB
2022-01-15T17:48:01.582019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum-2.32
5-th percentile0.146
Q12.6
median3.65
Q34.72
95-th percentile6.304
Maximum9.01
Range11.33
Interquartile range (IQR)2.12

Descriptive statistics

Standard deviation1.833432615
Coefficient of variation (CV)0.5202616589
Kurtosis0.713532252
Mean3.52405868
Median Absolute Deviation (MAD)1.07
Skewness-0.4876363405
Sum1441.34
Variance3.361475153
MonotonicityNot monotonic
2022-01-15T17:48:01.852865image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3.195
 
1.2%
3.124
 
1.0%
2.734
 
1.0%
3.474
 
1.0%
4.763
 
0.7%
5.183
 
0.7%
5.13
 
0.7%
4.263
 
0.7%
2.883
 
0.7%
4.233
 
0.7%
Other values (300)374
91.4%
ValueCountFrequency (%)
-2.321
0.2%
-2.31
0.2%
-2.061
0.2%
-1.941
0.2%
-1.771
0.2%
-1.51
0.2%
-1.391
0.2%
-1.381
0.2%
-1.221
0.2%
-1.081
0.2%
ValueCountFrequency (%)
9.011
0.2%
8.371
0.2%
7.461
0.2%
7.361
0.2%
7.291
0.2%
7.281
0.2%
7.091
0.2%
7.041
0.2%
71
0.2%
6.991
0.2%

PSA
Real number (ℝ≥0)

HIGH CORRELATION

Distinct363
Distinct (%)88.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean76.95643032
Minimum0
Maximum298.14
Zeros2
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2022-01-15T17:48:02.120713image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile25.504
Q152.31
median74.85
Q396.35
95-th percentile135.71
Maximum298.14
Range298.14
Interquartile range (IQR)44.04

Descriptive statistics

Standard deviation36.10628212
Coefficient of variation (CV)0.4691782346
Kurtosis4.15544506
Mean76.95643032
Median Absolute Deviation (MAD)22.2
Skewness1.165089748
Sum31475.18
Variance1303.663608
MonotonicityNot monotonic
2022-01-15T17:48:02.405548image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
23.474
 
1.0%
58.24
 
1.0%
12.473
 
0.7%
49.333
 
0.7%
23.553
 
0.7%
74.853
 
0.7%
37.33
 
0.7%
64.353
 
0.7%
50.942
 
0.5%
40.462
 
0.5%
Other values (353)379
92.7%
ValueCountFrequency (%)
02
0.5%
9.721
 
0.2%
12.032
0.5%
12.473
0.7%
15.271
 
0.2%
15.712
0.5%
20.231
 
0.2%
21.261
 
0.2%
23.474
1.0%
23.553
0.7%
ValueCountFrequency (%)
298.141
0.2%
218.991
0.2%
218.81
0.2%
208.651
0.2%
191.61
0.2%
178.31
0.2%
169.261
0.2%
166.751
0.2%
164.361
0.2%
159.511
0.2%

HBA
Real number (ℝ≥0)

HIGH CORRELATION

Distinct15
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.946210269
Minimum0
Maximum18
Zeros2
Zeros (%)0.5%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2022-01-15T17:48:02.656404image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q13
median5
Q36
95-th percentile9
Maximum18
Range18
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.276702458
Coefficient of variation (CV)0.4602922913
Kurtosis3.195541801
Mean4.946210269
Median Absolute Deviation (MAD)2
Skewness0.9620279714
Sum2023
Variance5.183374083
MonotonicityNot monotonic
2022-01-15T17:48:02.857289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=15)
ValueCountFrequency (%)
572
17.6%
664
15.6%
362
15.2%
462
15.2%
247
11.5%
741
10.0%
829
7.1%
915
 
3.7%
18
 
2.0%
02
 
0.5%
Other values (5)7
 
1.7%
ValueCountFrequency (%)
02
 
0.5%
18
 
2.0%
247
11.5%
362
15.2%
462
15.2%
572
17.6%
664
15.6%
741
10.0%
829
7.1%
915
 
3.7%
ValueCountFrequency (%)
181
 
0.2%
161
 
0.2%
131
 
0.2%
112
 
0.5%
102
 
0.5%
915
 
3.7%
829
7.1%
741
10.0%
664
15.6%
572
17.6%

HBD
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct11
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.733496333
Minimum0
Maximum12
Zeros54
Zeros (%)13.2%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2022-01-15T17:48:03.071168image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q32
95-th percentile4
Maximum12
Range12
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.370157337
Coefficient of variation (CV)0.790401059
Kurtosis10.27217497
Mean1.733496333
Median Absolute Deviation (MAD)1
Skewness2.144057009
Sum709
Variance1.877331128
MonotonicityNot monotonic
2022-01-15T17:48:03.304033image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
1149
36.4%
2118
28.9%
360
14.7%
054
 
13.2%
414
 
3.4%
58
 
2.0%
62
 
0.5%
71
 
0.2%
81
 
0.2%
91
 
0.2%
ValueCountFrequency (%)
054
 
13.2%
1149
36.4%
2118
28.9%
360
14.7%
414
 
3.4%
58
 
2.0%
62
 
0.5%
71
 
0.2%
81
 
0.2%
91
 
0.2%
ValueCountFrequency (%)
121
 
0.2%
91
 
0.2%
81
 
0.2%
71
 
0.2%
62
 
0.5%
58
 
2.0%
414
 
3.4%
360
14.7%
2118
28.9%
1149
36.4%

Smiles
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct409
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size3.3 KiB
O=C(OC(C(F)(F)F)C(F)(F)F)N1CCN(Cc2cccc(Oc3ccccc3)c2)CC1
 
1
COc1cc2ncnc(Nc3cccc(Br)c3)c2cc1OC.Cl
 
1
CC(C)(C)c1cc(C(=O)/C(C#N)=N/Nc2cccc(Cl)c2)no1
 
1
CCCCCCCCCCCCCCCCNc1ccc(C(=O)O)cc1
 
1
C=C[C@]1(C)CC[C@@H](C(=C)C)C[C@H]1C(=C)C
 
1
Other values (404)
404 

Length

Max length223
Median length49
Mean length51.08557457
Min length11

Characters and Unicode

Total characters20894
Distinct characters33
Distinct categories8 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique409 ?
Unique (%)100.0%

Sample

1st rowCSC[C@H](N)C(=O)O
2nd rowN#Cc1cnc2cnc(NCc3cccnc3)cc2c1Nc1ccc(F)c(Cl)c1
3rd rowCOc1ccc(-c2cnc3c(-c4cccc5ncccc45)cnn3c2)cc1
4th rowC[C@@H](CN1CCC(n2c(=O)[nH]c3cc(Cl)ccc32)CC1)NC(=O)c1ccc2ccccc2c1
5th rowO=C(O)c1cc2occc2[nH]1

Common Values

ValueCountFrequency (%)
O=C(OC(C(F)(F)F)C(F)(F)F)N1CCN(Cc2cccc(Oc3ccccc3)c2)CC11
 
0.2%
COc1cc2ncnc(Nc3cccc(Br)c3)c2cc1OC.Cl1
 
0.2%
CC(C)(C)c1cc(C(=O)/C(C#N)=N/Nc2cccc(Cl)c2)no11
 
0.2%
CCCCCCCCCCCCCCCCNc1ccc(C(=O)O)cc11
 
0.2%
C=C[C@]1(C)CC[C@@H](C(=C)C)C[C@H]1C(=C)C1
 
0.2%
COc1ccccc1C1(O)CCN(CCCn2c3ccccc3c3ccccc32)CC11
 
0.2%
O=C(O)C1CN(Cc2ccc(-c3nc4cc(Cc5ccccc5F)ccc4s3)c(F)c2)C11
 
0.2%
CN(C)Cc1cccc(N/C(=C2\C(=O)Nc3cc(C(=O)N(C)C)ccc32)c2ccccc2)c11
 
0.2%
CC(C)Cc1ccc([C@@H](C)C(=O)O)cc11
 
0.2%
CC(C)c1cc(-c2ccc(F)c3ccccc23)nc(N)n11
 
0.2%
Other values (399)399
97.6%

Length

2022-01-15T17:48:03.906689image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
cc(=o)ncc(=o)n1[c@h]2cc[c@@h]1c1cc(nc3ncc(c(f)(f)f)c(nc4ccc4)n3)ccc121
 
0.2%
ccnc(=o)c1cc2c(-c3cc(c(c)(c)o)ccc3oc3c(c)cc(f)cc3c)cn(c)c(=o)c2[nh]11
 
0.2%
cc(c)c1ccc(-c2cc(=o)c3ccccc3o2)cc11
 
0.2%
o=c(o)ccn(o)c(=o)ccccccccc1cc11
 
0.2%
coc1ccc(ccn(c)ccc2ccc(oc)c(oc)c2)cc1oc.cl1
 
0.2%
coc1ccc(-c2coc3cc(o)cc(o)c3c2=o)cc11
 
0.2%
o=c(o)/c=c\c(=o)o.o=c(c1ccc(occcc2c[nh]cn2)cc1)c1cc11
 
0.2%
ccoc1ccccc1-n1c(c(c)n2ccn(c(=o)coc3ccc(cl)cc3)cc2)nc2ccccc2c1=o1
 
0.2%
o=c1cc(n2ccocc2)oc2c(-c3ccccc3)cccc121
 
0.2%
oc[c@h]1o[c@@h](oc2cc3c(o)cc(o)cc3[o+]c2-c2ccc(o)c(o)c2)[c@h](o)[c@@h](o)[c@@h]1o1
 
0.2%
Other values (399)399
97.6%

Most occurring characters

ValueCountFrequency (%)
c5048
24.2%
C3392
16.2%
(1958
 
9.4%
)1958
 
9.4%
11180
 
5.6%
O1082
 
5.2%
2968
 
4.6%
N727
 
3.5%
n631
 
3.0%
=598
 
2.9%
Other values (23)3352
16.0%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter5913
28.3%
Lowercase Letter5895
28.2%
Decimal Number2838
13.6%
Open Punctuation2405
11.5%
Close Punctuation2405
11.5%
Other Punctuation646
 
3.1%
Math Symbol617
 
3.0%
Dash Punctuation175
 
0.8%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
C3392
57.4%
O1082
 
18.3%
N727
 
12.3%
H377
 
6.4%
F244
 
4.1%
S74
 
1.3%
B16
 
0.3%
I1
 
< 0.1%
Lowercase Letter
ValueCountFrequency (%)
c5048
85.6%
n631
 
10.7%
l129
 
2.2%
s40
 
0.7%
o29
 
0.5%
r16
 
0.3%
a2
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
11180
41.6%
2968
34.1%
3474
16.7%
4166
 
5.8%
546
 
1.6%
64
 
0.1%
Other Punctuation
ValueCountFrequency (%)
@512
79.3%
.51
 
7.9%
/45
 
7.0%
#28
 
4.3%
\10
 
1.5%
Open Punctuation
ValueCountFrequency (%)
(1958
81.4%
[447
 
18.6%
Close Punctuation
ValueCountFrequency (%)
)1958
81.4%
]447
 
18.6%
Math Symbol
ValueCountFrequency (%)
=598
96.9%
+19
 
3.1%
Dash Punctuation
ValueCountFrequency (%)
-175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin11808
56.5%
Common9086
43.5%

Most frequent character per script

Common
ValueCountFrequency (%)
(1958
21.5%
)1958
21.5%
11180
13.0%
2968
10.7%
=598
 
6.6%
@512
 
5.6%
3474
 
5.2%
[447
 
4.9%
]447
 
4.9%
-175
 
1.9%
Other values (8)369
 
4.1%
Latin
ValueCountFrequency (%)
c5048
42.8%
C3392
28.7%
O1082
 
9.2%
N727
 
6.2%
n631
 
5.3%
H377
 
3.2%
F244
 
2.1%
l129
 
1.1%
S74
 
0.6%
s40
 
0.3%
Other values (5)64
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII20894
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
c5048
24.2%
C3392
16.2%
(1958
 
9.4%
)1958
 
9.4%
11180
 
5.6%
O1082
 
5.2%
2968
 
4.6%
N727
 
3.5%
n631
 
3.0%
=598
 
2.9%
Other values (23)3352
16.0%

MW
Real number (ℝ≥0)

Distinct405
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean374.9159487
Minimum89.094
Maximum923.449
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2022-01-15T17:48:04.195522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum89.094
5-th percentile176.9946
Q1296.414
median369.343
Q3452.467
95-th percentile583.3322
Maximum923.449
Range834.355
Interquartile range (IQR)156.053

Descriptive statistics

Standard deviation124.9275481
Coefficient of variation (CV)0.333214814
Kurtosis1.118962417
Mean374.9159487
Median Absolute Deviation (MAD)78.177
Skewness0.4405301814
Sum153340.623
Variance15606.89227
MonotonicityNot monotonic
2022-01-15T17:48:04.423392image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
369.3432
 
0.5%
321.382
 
0.5%
163.1732
 
0.5%
404.4622
 
0.5%
516.6671
 
0.2%
362.3631
 
0.2%
308.3371
 
0.2%
440.3671
 
0.2%
337.3751
 
0.2%
520.6371
 
0.2%
Other values (395)395
96.6%
ValueCountFrequency (%)
89.0941
0.2%
96.1331
0.2%
114.1041
0.2%
115.1321
0.2%
116.121
0.2%
117.1481
0.2%
126.1111
0.2%
127.1681
0.2%
129.1151
0.2%
136.1541
0.2%
ValueCountFrequency (%)
923.4491
0.2%
862.7461
0.2%
789.0991
0.2%
785.0251
0.2%
669.7771
0.2%
658.7071
0.2%
637.7441
0.2%
635.9421
0.2%
622.841
0.2%
621.9151
0.2%

pIC50
Real number (ℝ≥0)

Distinct273
Distinct (%)66.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.355827231
Minimum2.065501549
Maximum10.1426675
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.3 KiB
2022-01-15T17:48:04.713225image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum2.065501549
5-th percentile3.862782522
Q15
median6.522878745
Q37.657577319
95-th percentile8.821607809
Maximum10.1426675
Range8.077165955
Interquartile range (IQR)2.657577319

Descriptive statistics

Standard deviation1.65184478
Coefficient of variation (CV)0.2598945378
Kurtosis-0.7937361524
Mean6.355827231
Median Absolute Deviation (MAD)1.363177902
Skewness-0.09665965835
Sum2599.533337
Variance2.728591178
MonotonicityNot monotonic
2022-01-15T17:48:04.996065image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
414
 
3.4%
514
 
3.4%
76
 
1.5%
8.0969100136
 
1.5%
65
 
1.2%
85
 
1.2%
7.3979400095
 
1.2%
8.5228787455
 
1.2%
7.7958800175
 
1.2%
8.0457574915
 
1.2%
Other values (263)339
82.9%
ValueCountFrequency (%)
2.0655015491
0.2%
2.3010299962
0.5%
2.4600000011
0.2%
2.8239087411
0.2%
31
0.2%
3.0861861481
0.2%
3.3010299961
0.2%
3.4061603391
0.2%
3.4934949681
0.2%
3.4948500221
0.2%
ValueCountFrequency (%)
10.14266751
 
0.2%
10.045757491
 
0.2%
9.9208187541
 
0.2%
9.8860566482
0.5%
9.5228787451
 
0.2%
9.3979400092
0.5%
9.3010299963
0.7%
9.2999889381
 
0.2%
9.1804560641
 
0.2%
9.1135092751
 
0.2%

Interactions

2022-01-15T17:44:00.948406image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:01.158285image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:05.117245image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:10.098391image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:10.340250image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:10.563125image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:10.747019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:10.966894image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:20.900201image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:34.624541image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:44:49.757870image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:45:00.445010image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:45:11.086912image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:45:20.978246image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:45:30.898560image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:45:43.593289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:46:01.004674image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:46:18.890818image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:46:31.616779image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:46:46.178439image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:02.311193image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:15.063917image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:15.354751image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:19.609330image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:25.138654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:25.440466image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:25.716323image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:25.959188image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:26.234026image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:26.450902image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:30.493576image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:35.325983image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:35.593829image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:35.831791image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:36.038655image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:36.269522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:36.449420image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:40.449150image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:45.248401image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:45.875019image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:46.227818image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:46.581615image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:46.999376image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:47.407141image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:52.331322image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:58.215948image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:58.526770image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:58.810606image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2022-01-15T17:47:59.051471image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2022-01-15T17:48:05.263910image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2022-01-15T17:48:05.566740image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2022-01-15T17:48:05.792607image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2022-01-15T17:48:06.034491image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2022-01-15T17:47:59.520205image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2022-01-15T17:47:59.985934image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexChEMBL IDAlogPPSAHBAHBDSmilesMWpIC50
02CHEMBL394875-0.2463.3232CSC[C@H](N)C(=O)O352.3974.200659
14CHEMBL2003815.0486.5262N#Cc1cnc2cnc(NCc3cccnc3)cc2c1Nc1ccc(F)c(Cl)c1151.1217.301030
27CHEMBL5023514.6252.3150COc1ccc(-c2cnc3c(-c4cccc5ncccc45)cnn3c2)cc1380.8065.522879
38CHEMBL4925724.5970.1342C[C@@H](CN1CCC(n2c(=O)[nH]c3cc(Cl)ccc32)CC1)NC(=O)c1ccc2ccccc2c1603.6017.337242
49CHEMBL4925911.4666.2322O=C(O)c1cc2occc2[nH]1337.2616.850781
510CHEMBL4947722.3796.2862COc1cc(Cc2cnc(N)nc2N)c(C(C)C)cc1OC465.5545.700001
611CHEMBL5611033.74110.53101COC(=O)Nc1ccc(-c2nc(N3CCOCC3)c3cnn(C4CCN(Cc5cccnc5)CC4)c3n2)cc1509.3938.337242
712CHEMBL5595254.6958.2022O=C(NC(=O)c1c(F)cccc1Cl)NC1c2ccccc2-c2ccccc21188.2318.818156
814CHEMBL3979835.8199.4480CCOc1ccc(-n2c([C@@H](C)N(Cc3cccnc3)C(=O)Cc3ccc(OC(F)(F)F)cc3)nc3ncccc3c2=O)cc1387.8678.096910
915CHEMBL10962831.0682.6770Cn1nc(-c2ccc(C(F)(F)F)cc2)nc2c(=O)n(C)c(=O)nc1-2396.6723.698970

Last rows

df_indexChEMBL IDAlogPPSAHBAHBDSmilesMWpIC50
399969CHEMBL5393131.7733.6231CCC1(C2=NCCN2)Cc2ccccc2O1.Cl189.1275.200000
400971CHEMBL24303593.14129.2183CN(c1ncccc1CNc1nc(Nc2ccc3c(c2)CC(=O)N3)ncc1C(F)(F)F)S(C)(=O)=O.O=S(=O)(O)c1ccccc1214.6487.853872
401972CHEMBL5923742.8512.0311Clc1ccc([C@]23CNC[C@H]2C3)cc1Cl350.4477.292430
402973CHEMBL6007643.3565.7941O=C(CN1CCN(C(=O)c2ccco2)CC1)Nc1cc(C(F)(F)F)ccc1Cl300.2665.000000
403975CHEMBL5830425.43102.9354Cc1cnc(Nc2ccc(F)cc2Cl)nc1-c1c[nH]c(C(=O)N[C@H](CO)c2cccc(Cl)c2)c1497.5997.318759
404977CHEMBL3270025.37138.1873Cc1cc(C)c(N(Cc2ccccc2)S(=O)(=O)c2ccc(OCCNC(=O)c3cc4ccccc4o3)cc2)c(C(=O)NO)c1469.5747.769551
405979CHEMBL2547605.6881.0772Cn1cnc(-c2cc3nccc(Oc4ccc(NC(=S)NC(=O)Cc5ccccc5)cc4F)c3s2)c1250.2947.537602
406980CHEMBL5395073.3312.4720C#CCN(C)CCCOc1ccc(Cl)cc1Cl.Cl163.1734.301030
407983CHEMBL15536295.3548.1322CCc1c(C(=O)NCCc2ccc(N3CCCCC3)cc2)[nH]c2ccc(Cl)cc12151.2095.000000
408988CHEMBL4731590.8060.6933Oc1cc(O)cc(O)c1367.7883.406160